Hybrid syllable/triphone speech synthesis
نویسندگان
چکیده
In this paper, the syllable, an alternative phonetic unit to the phone, is researched in the context of speech synthesis. Several approaches to syllable modelling within the statistical approach (using hidden Markov models) to the acoustic unit inventory creation are proposed and evaluated. To be able to synthesize an arbitrary text, the syllable inventories were supplemented with triphones resulting in hybrid syllable/triphone inventories. Listening tests were accomplished both to assess the quality of the resulting synthetic speech produced using the hybrid syllable/triphone inventories and to choose the best approach to syllable modelling. The resulting synthetic speech is highly intelligible and fluent. Although the synthetic speech generated using the baseline triphone inventory was assessed slightly better, the results of the very first experiments with syllable modelling are very promising.
منابع مشابه
Syllable-based and hybrid acoustic models for Amharic speech recognition
This paper presents the results of our experiments on the use of hybrid acoustic units in speech recognition and the use of syllable and hybrid acoustic models (AM) in morphemebased speech recognition. Although hybrid AMs did not bring improvement in speech recognition performance when words are used as dictionary entries and units in a language model (LM), we observed a significant word error ...
متن کاملText-to-audio-visual speech synthesis based on parameter generation from HMM
This paper describes a technique for synthesizing auditory speech and lip motion from an arbitrary given text. The technique is an extension of the visual speech synthesis technique based on an algorithm for parameter generation from HMM with dynamic features. Audio and visual features of each speech unit are modeled by a single HMM. Since both audio and visual parameters are generated simultan...
متن کاملSyllable-based acoustic modeling for Japanese spontaneous speech recognition
We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous ...
متن کاملEffect of Prosodic Structure on Segmental Variants
There is a large amount of segmental variants in a natural speech corpus. It is very important to label those variants correctly for a corpus based TTS system. We successfully applied automatic triphone segmentation to a large speech corpus with syllable segmentation and prosodic annotation. In this paper, we also report (1) recognition error analysis based on prosodic structure, and (2) the re...
متن کاملA nonlinear unit selection strategy for concatenative speech synthesis based on syllable level features
This paper describes an improved algorithm, motivated by fuzzy logic theory, for the selection of speech segments for concatenative synthesis from a huge database. Triphone HMM clustering is employed as an adaptive measure for articulatory similarity within a given database. Stress level contours are evaluated in the context of their surrounding vocalic peaks. The algorithm uses a beam search t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005